\documentclass{article}
\usepackage[margin=1in]{geometry}
\pagestyle{empty}
\title{Assignment 1 for CCHL 7001H: 	Longitudinal Data Analysis}
\author{Patrick Brown}
\date{Due date: 10 October, 5pm}

\begin{document}


\maketitle

The ``ageing data'' on the course web site is taken from  Rabbitt \emph{et al.} (2001), \emph{Neuropsychologia}, {\bfseries 39}:5.  Subjects in the study were asked on up to  three occasions over a number of years to complete 
the Mill Hill Vocabulary Test.  The Mill Hill test has two parts, which are as follows:

  
  \begin{itemize}

  \item Part A: Identify a synonym for a given word from 5 alternatives, for example

    \textbf{~1. mingle}: interfere; mix; declare; press; gamble; remark

    \textbf{34. ambit}: talisman; confines; armature; arc; cambre; ideal

  \item Part B: ``write down in a few words the meaning of each of the
  following words''

  \textbf{~3. stubborn} \rule{4cm}{0.3pt}
  
  \textbf{32. sedulous} \rule{4cm}{0.3pt}
  \end{itemize}

	The columns of the ``ageing'' dataset contain 1) the identification number of the subject, 2) the city the observation was made in, 3) the sex of the subject, 4) the social class of the subject, 5) the age of the subject on the day the observation was made, 6) Mill Hill vocabulary test part ``a'' and 7) Mill Hill vocabulary test part ``b''.   The codes for social classes are (1)  professional (2) intermediate  (3 N) non-manual skilled (3 M) manual skilled (4) partly skilled (5) unskilled.




\begin{enumerate}
	\item It is hypothesised that since part B of the test is more difficult, the effect of ageing will be more pronounced on part B scores than part A scores.  The clinicians believe that cognitive decline sets in at age 65, and is a linear decreasing trend.  Social class is an important counfounder which much be taken into account, but it is believed that there is no interaction between age and social class.  Fit separate models to the test A and test B results, as it is believed that all the parameters are different for each of the tests.  Assess the hypothesis that ageing is more pronounced on part B than part A.  Justify your choice of model and show one or two graphs to illustrate your conclusion.  
	[10 pts]
	
hint: \verb!data$newage = pmax(data$age, 65)!
	
	\item There is a lot of variability in test scores.  Is this this because there are big differences in people's abilities? Or because the test itself can give different results even if two subjects have the same ability?  [5 pts]

\item The clinicians wish to identify the individuals with the highest cognitive abilities, after adjusting for their age and social class.  They would like a list of individuals who, on average, can be expected to score at least 5 points higher on test B than the population average for their age and social class.  Find the 10 subjects with the highest conditional probability of being at least 5 points above average.  Don't use a serially correlated term in your model for this part, even if used one earlier.  Explain your work. [5 points]



\end{enumerate}


Your report should contain at most 2 pages of writing, in addition to one or two graphs and at most 5 tables.

\end{document}

alldata = read.table('M:/admin/teaching/spatial/assessment/data/ageing.txt', header=T)
withthree = table(alldata$Subject)
withthree = names(withthree[withthree>=2])
# get rid of subject 70, who has the same age at both tests
withthree = withthree[withthree != '70']
data = alldata[alldata$Subject %in% withthree,]
rownames(data)=NULL
write.table(data, 'M:/admin/teaching/spatial/assessment/data/ageing2.txt')
data$age2 = pmin(data$age, 65)
library(nlme)
summary(lme(mha ~ age2*soc.class + sex, random = ~1|Subject, data=data,
	correlation = corExp(form = ~age|Subject, nugget=T)))
	summary(lme(mhb ~ age2*soc.class + sex, random = ~1|Subject, data=data,
	correlation = corExp(form = ~age|Subject, nugget=T)))